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Abstract. We consider the problem of solving TAP mean field equations by iteration 
for Ising models with coupling matrices that are drawn at random from general invariant 
ensembles. We develop an analysis of iterative algorithms using a dynamical functional 
approach that in the thermodynamic limit yields an effective dynamics of a single variable 
trajectory. Our main novel contribution is the expression for the implicit memory term 
of the dynamics for general invariant ensembles. By subtracting these terms, that depend 
on magnetizations at previous time steps, the implicit memory terms cancel making the 
iteration dependent on a Gaussian distributed field only. The TAP magnetizations are 
stable fixed points if an AT stability criterion is fulfilled. We illustrate our method 
explicitly for coupling matrices drawn from the random orthogonal ensemble. 
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1. Introduction 

TAP equations provide generalized mean held equations for statistical physics models with 
random, inhnite range interactions which (under certain conditions) are assumed to be 
exact in the limit of an inhnite system [1]. In recent years there has been an increasing 
interest in such equations within and also outside of the statistical physics community. 
This is partly due to the fact that the TAP approach can be applied to statistical inference 
in probabilistic models in information theory [2], [3], statistics [4] and machine learning [5], 
[6]. Originally developed by D.J. Thouless, P. Anderson and R. Palmer for the Sherrington 
Kirkpatrick (SK) model of an Ising spin-glass [7], the TAP approach has been generalised 
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to a variety of other problems [4], This includes models with continuous variables rather 
than Ising spins but also cases where the independent random interactions are replaced by 
other, more structured statistical ensembles that allow for certain dependencies. 

While methods for deriving the TAP approach for different models are now well 
established it is not necessarily clear how the resulting system of nonlinear equations can 
be solved efficiently. A naive algorithm based on a simple iteration of the equations usually 
fails to achieve convergence. This problem has been addressed by a paper of Bolthausen 
for the case of the SK model [8]. He has analyzed the dynamics of iterations rigorously and 
shown how the iterations can be altered in order to achieve exponential convergence (above 
the so-called AT line of stability). Other ideas to arrive at a convergent method are based 
on taking the limit of dense couplings in belief propagation algorithms, see [2], [9], and 
[10], (for rigorous analyses [3] and [11]). Unfortunately, for this approach it is necessary 
to augment the original variables by auxiliary ones, such that the interactions in the new 
model are independent. For example, the Hopfield model can be represented by a bipartite 
graph of Ising spins and continuous variables. It is not clear how such a method should 
be set up for a matrix of interactions with more general statistical dependencies. Also, 
taking the dense coupling limit of an approximate message passing (AMP) algorithm valid 
for sparse coupling problems will not always lead to the correct dense coupling algorithm. 
Here we will construct a theory for dynamics using dense couplings as the starting point. 

In this paper we will address the problem of solving TAP equations for Ising models 
with dense random coupling matrices with a general invariant probability distribution. Our 
analysis is based on dynamical mean field theory which allows to study the dynamics of 
iterative algorithms in the thermodynamic limit by a suitable average over the ensemble 
of couplings. It turns out that by including certain memory terms in the iteration, the 
effective field in the dynamics becomes a simple Gaussian random variable suggesting that 
the dynamics might converge. The explicit form of the memory terms depends explicitly 
on the statistical ensemble of couplings. We show that our method reproduces previous 
convergent algorithms for SK and Hopfield models. We also work out the details of our 
theory for the spin model with orthogonal random couplings [12, 13]. Simulations of the 
resulting algorithms show exponential convergence above a line of stability which can be 
identified with the so-called AT line. 

The paper is organized as follows: in Section H we introduce the general random 
matrix formulation for the TAP equations. In Section HI we present the results of 
dynamical functional theory. In Section IV, we introduce “the single-step memory 
construction” iterative algorithm for solving TAP equations. Section V is devoted to 
the derivation of the AT stability condition. Discussions and outlooks are presented in 
Section VI. Lengthy technical derivations are deferred to the Appendix. 
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2. General Invariant Random Matrix Ensembles 
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We will consider Ising models with pairwise interactions given by the Gibbs distribution 
for the spins S = {Si,..., S^) 


F(S) = -exp 


N 


N 




i<j 


( 1 ) 


with Z denoting the normalization constant. We are interested in the case, where the 
matrix J is random with the condition that the marginal density of couplings p{Jij) is 
the same for all pairs {i,j) but couplings might be dependent random variables. A simple 
way for dehning such class of random matrices is via the so-called invariant ensembles 
[14]. A random matrix J is called invariant if it has the same probability distribution as 
V^JV for an orthogonal matrix V which is independent of J . Equivalently, it admits 
the spectral decomposition 

J = 0^\0 (2) 

where O is Haar distributed (i.e. it is a random orthogonal matrix) and independent of 
the diagonal matrix A. This characterization of invariant matrices involves the diagonal 
elements Ju which is absent in (1). However, one can show that as N tends to inhnity if the 
spectrum of J converges almost surely to a compactly supported distribution such that the 
smallest and largest eigenvalue of J converge almost surely to the inhmum and supremum 
of the support, respectively, we have (in the almost sure sense) Ju — j^tT{J) 0, Vz as N 
tends to inhnity, see Appendix A. In other words the diagonal elements of an invariant 
matrix converge to the same deterministic limit. Thus, we may dehne asymptotically 
invariant couplings J as Jij = (3Jij for i j and Ju = 0, where we include an inverse 
temperature factor fl in the dehnition. The couplings for the standard SK and Hopheld 
models belong to this class of matrices for {—J) being the Gaussian Wigner matrix and 
the null Wishart matrix (i.e. a sample covariance matrix of independent Gaussian random 
vectors whose entries are zero mean, independent and identical distributed), respectively. 


2.1. The Generating Function and the R-transform 

We will later need the generating function of asymptotically invariant random matrices 
given by [13, 15] 

with symmetric matrix Q having a hnite rank. Since we believe that our paper might be 
of interest to researchers with an information theory background, we will briehy mention 
how G is related to quantities which are well known in the theory of free probability [16], 
which is a powerful approach to random matrix theory. Setting 

G{x) = i r doj R(lo) (4) 


t Here (d^ denotes transposition. 
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one can show that R equals the so-called R-transform (in the theory of free probability) of 
the limiting spectrum. Its formal dehnition can be given in terms of the Cauchy transform: 
let P denote the limiting spectrum of J. Moreover let 


M(2) 


A 


1 


+ 5 : 

n=l 


mn 

Zn+1 ’ 


( 5 ) 


where is nth order moments of the limiting spectrum P, i.e. = J dP(A) A”. Then 
the R-transform of P is given by 

R(a:) = — — (6) 

X 

with denoting the composition inverse of M. It admits the power series expansion 

CX) 

R(x) = ^ , (7) 

n=l 


where c„ are known as the free cumulants of P. For example, the first two free cumulants 
Cl and C 2 are the mean and variance of the distribution P, respectively, i.e. Ci = mi and 
C 2 = m 2 — rn\. For details we refer the reader to [17]. 

In the sequel we give the R-transforms for some random matrix ensembles that we 
will discuss later. We first point out the simple identity 


R(x) = /9(R(/3x) — Cl) . 


( 8 ) 


In this expression R denotes the R-transform of the limiting spectrum of J and Ci = R(0). 
We next provide the explicit form of R(a:) for the SK, Hopfield and random orthogonal 
[12, 13] models, respectively: i) For the SK case we have a (symmetric) matrix J 
whose diagonal entries are zero and the upper-triangle entries are independent identically 
distributed (iid) Gaussian with zero mean and variance 1/N. Note that in this case we 
have J = fIJ] so that R(a;) = [17]; ii) For the Hopfield model, we consider entries 

of an (N/a) x N matrix H that are iid and Gaussian with zero mean and variance a/N. 
Moreover let J = —H^H, i.e. {—J) is a null Wishart matrix whose limiting spectrum is 
given by the Marcenko-Pastur distribution. By invoking the R-transform of the Marcenko- 
Pastur distribution, see e.g. [17], we have 


R(a:) 


13‘^ax 
1 -|- ftax' 


( 9 ) 


iii) Finally, for the random orthogonal case we consider a spectral decomposition (2) such 
that J = O^AO. Here the diagonal entries of the diagonal matrix A are composed of ±1 
such that tr(A) = 0. Then, one can easily show that 


— 1 -|- \/l + 


R(a:) 

The latter result was given in [12]. 


2x 


( 10 ) 
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2.2. TAP Equations for General Invariant Gouplings 

TAP equations are a set of self-consistent equations for the vector of magnetisations 
m = (S) where the brackets denote expectation w.r.t. the Gibbs distribution (1). For a 
general invariant ensemble they have been derived hrst in [12] using the large N scaling of 
a perturbation expansion. A second derivation using the cavity method and the large N 
limit of the ‘adaptive’ TAP equations can be found [4]. We have provided a more rigorous 
derivation of the transition from “adaptive TAP” to the self-averaging limit using random 
matrix theory in Appendix B. The resulting TAP equations read 

m = tanh('i/?) (11) 

= h + Jm — R(1 — q)m (12) 

where q = and h is the vector of nonrandom external helds. Note, that the only 

dependency on the random matrix ensemble is via the R-transform R(1 — q) in the so- 
called Onsager term which is a correction to the naive mean held term Jm. One can show 
that Tj is the mean of the cavity held. Furthermore, following the calculations of [4] one 
hnds that Tj is Gaussian distributed (in the large N limit) with respect to the random 
couplings J with mean and variance 

{{^,-h,Y) = qR'{l-q). (13) 

Hence, the subtraction of the Onsager term R(1 — q)m from the mean held Jm makes 
the remainder Gaussian. We will next transfer the idea of a Gaussian held from the static 
solutions to the dynamics of an algorithm. 

3. The Results of Dynamical Functional Theory 

Dynamical properties of disordered systems can be computed by the method of dynamical 
functionals [18]. In the limit N ^ oo this method provides us with exact results for the 
marginal distribution of a trajectory of a single variable (in our case a magnetization mj(t)), 
when we dehne a dynamics of an algorithm for the solution of the TAP equations. As a 
typical result of such a calculation one hnds that the ‘held’ Jm{t) becomes a sum of a 
Gaussian term and a memory term which includes the magnetizations at all previous times. 
This memory often makes the dynamics of disordered systems highly complex allowing e.g. 
for a persistent dependency on the initial conditions and thus a failure to converge to 
a unique hxed point. Hence, we propose to introduce explicit memory terms which are 
chosen to cancel the implicit memory terms derived from the dynamical functional theory. 
In such a way, at each time step, the update of the magnetization for the algorithm 
involve a Gaussian distributed random held only and we expect that we might obtain 
good convergence results. This Gaussian property of the ehective dynamical held was 
already shown for a Hopheld model in [2] and [9] (and proved in [3]) and reappeared in 
Bolthausen’s iterative construction of solutions to the TAP equations for the SK model in 
[ 8 ]. 
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We start with defining a set of dynamical equations which could serve as a candidate 
algorithm for solving the TAP equations for invariant random coupling matrices 


■mit) = ft ({7(r),m(r)}t=o) 
7 (t) = h + Jm{t) 


(14) 

(15) 


for r = 0,... which depend on the field Jm{t) and the previous local magnetizations 
m{T). Here ft is an appropriate sequence of non-linear scalar functions. Our goal is to get 
the statistics of a single trajectory of (14)-(15), when J is a random matrix with generating 
function (3). To do so we make use of the dynamical functional theory (DFT) analysis as 
described in [19, 20] which is a discrete time version of the method of [18]. We also refer 
the reader to [21] where DFT was used to analyze the AMP algorithm in the context of 
the CDMA communication algorithm. 

We introduce the generating functional corresponding to the dynamics (14)-(15) as 


, T-l 


Z{{l{t)}) = / n d{m{t) - ft ({7(r), m(r)}t=o)) 

5(7(f) -h- 


t=o 




(16) 


Notice that Z{{l{t) = 0}) = 1. The statistics of the variables can be computed from 
the averaged generating functional {Z{{l(t)}))j. In the large N limit we obtain that 
(see Appendix C) 


N 


T-l 


{z({i(i)})).-n/ 

n=l t=0 

i fin)!) - K -'E,S(t,s)mJs) - (>„(!) j 

( 17 ) 

with fi, S) denoting the multivariate normal distribution with mean jj, and covariance 
S. This result shows that in the large N limit single trajectories can be treated as 
independent following the effective stochastic dynamical process given by 


= ft ({7(r),m(r)}tJo) 

t-l 

7(i) = h + ^ 6(i, s)m(T) + </)(i) 


(18) 

(19) 


T = 0 


Here (p{t) is a vector of independent Gaussian random variables with covariance matrix 
given by 


n—2 




n—2—k 


n=l k=0 


( 20 ) 
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where Q and C are T x T the response and the correlation matrices, respectively. With 
slight abuse of notation, their (t + 1, r + 1) indexed entries are given by 



^ ^ {'mi{t)mi{T))^.. ( 22 ) 

i=l 


Moreover the specific random matrix ensemble enters the result through the coefficients 
Cn, see (7), and the memory matrix Q given by 

g = R(ei). (23) 

So far we have not yet referred to the TAP equations in the DFT analysis. Instead 
we have considered a somewhat general dynamical system with disorder and memory. 
Such a formulation gives us enough freedom to construct a convenient dynamics which 
asymptotically converges to the solution of the TAP equations. We will dehne the dynamics 
to be of the form 

m{t + 1) = tanh('0(f)) (24) 

where the variables 'ijji{t) must be chosen to become independent Gaussian fields in the 
resulting effective single variable dynamics (19). In fact, there are actually various methods 
for doing so. In the sequel we will limit our attention to a method that we call the single 
step memory construction. 

4. The Single Step Memory Construction 

In the single step memory algorithm we will construct the update in such a way that the 
resulting memory term (23) satisfies the equation 

^(f, r) = 0,Vr 7 ^ t - 1. (25) 

Hence, if (25) holds, then using (19) we find that the variable 

lit) —Q{t,t — l)m{t — 1) = (fit) + h (26) 

becomes a a Gaussian held. We will choose the held 'if it) in (24) as a linear combination 
of the Gaussian helds 0('r), r = 1,..., f of the form 

t 

-Ifit) = '^Ait + l,T)i(fiT)+ h) (27) 

T=0 

where we have to construct the non-random terms Ait + 1, r) to make the dynamical order 
parameters consistent with the single step memory condition (25). This condition leads 
to a very simple result for the response function (21) because there is no complicated 
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propagation in time of a response to an external field. In fact, from (24) we obtain for the 
response function (21) 





ml{t)) 


d'4)i{t - 1 ) \ 

defier) / 


(1 - q{t))A{t,T) 


(28) 

(29) 


with q{t) = ■Am{tym{t). Thus we have the explicit result 


A{t,T) 


1 - q{t) ■ 


(30) 


Finally, using (23) we get an explicit result for the response function in terms of the memory 
terms Q{t,t — 1). Note that by construction of the single step memory matrix Q (25) we 
can write (23) as 

t 

g{t,T) = at-r ^(s, s-1), (31) 

s=t4-1 

where the coefficients a„ are obtained from the power series expansion of the composition 
inverse of the R-transform: 

OO 

R“^(x) = . (32) 

n=l 

By dehnition the trace of J is zero, i.e. R(0) = 0. Hence, the power series expansion in 
(32) starts from the hrst order term. 

To complete the speciheation of the single step memory construction we only need to 
specify Q{t,t — 1). This will be chosen such that the method is asymptotically consistent 
with the static TAP equations. Specihcally, from (12) we should have 


lim Q(t,t — 1) = R(1 — q). (33) 

t—)-oo 


We choose the explicit form 

g(t.i-l)= (34) 

which assuming convergence q(t) —?■ g as f —?■ cx) leads to (33). This form has also the 
advantage, that in (30), for the update an unwanted factor 1 — q(t + 1), which would make 
'0(t) depending on the future state m{t + 1), cancels. 


4.1. Summary 

Putting everything together the single-step memory algorithm for f > 0 is dehned as 

m{t -I- 1) = tanh('i/>(f)) (35) 

t 

il){t) = Q{t) ^ at+i-ru{T) (36) 

r=0 


u{t) 


h -|- Jm(t) — g{t,t — l)m{t — 1) 
Q(t- 1)(1 -q(t)) 


(37) 
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where we introduce 

t 

Q{t) = n - ^(^)) = - ^(^)) (38) 

T=0 

such that Q(—1) = 1. The memory term Q{t,t — 1) is given by (34). Moreover the 
algorithm initializes with m{t) = 0 for t G {—1, 0}. 


4-2. Asymptotic Consistency with TAP Equations 

In the sequel we show that if the single step memory algorithm (35)-(37) converges, it 
solves the TAP equations (11)-(12). Let us assume that m{t) —?■ m as t tends to inhnity. 
To have the convergence to the TAP equations we solely need to show that the sum in (36) 
converges to the proper limit. From (27) and (30) we must have 


«(i)E 




(1-^(^))Q(^-1) 


''^G(t + 1,r) ^ 1. 


(39) 


We make the so-called weak long-term response assumption [22] that 


lim G(t,T) = 0, V finite r. (40) 

t^OO 

Hence, for sufficiently large t and r' < t such that tfr' being finite as t ^ oo, we can write 


t t 


^ g{t + 1, r) ~ ^ g{t + 1, r) 

T=0 r=T^ 

( 41 ) 

t 

~ at+i_^R (1 - 

t=t' 

( 42 ) 

oo 

~ ^a„R(l - qY 

71— 1 

( 43 ) 

= R-\R{l-q)) = l-q. 

( 44 ) 


Next we will provide the details of the single-step memory algorithm for the SK, Hopheld 
and random orthogonal models. 


4-3. Example 1 The SK-Model 

Recall that, for the standard SK model we have R(a;) = so that R“^(x) = x/(3^. 
Hence, oi = 1/ff^ and = 0 for n > 1. Thus, the single-step memory algorithm may 
written as 


m(t -|- 1) = tanh(i/?(t)) 

'if{t) = h + Jm(t) — /3^(1 — q{t))m(t — 1). 


( 45 ) 

( 46 ) 
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At first glance, these dynamical equations are similar but not exactly equal to those 
proposed by Bolthausen [8]. The difference is that instead of the dynamical order 
parameter q{t) the hxed point solution of q appears. Using the explicit form of the 
covariance of the held 'ifi{t) given by (60) in the next section, one hnds for the held 
variance {{'ipi{t) — {'tpi{t)))^) = ff^qit). Hence, if we start the iteration (as in [8]) with 
mj(l) = such that q{l) = g, then we hnd that in the large N limit, we also have 
q{t) = (^tanh'^{^jJi{t — 1)) = g for all times t and we get agreement with [8]. 


4 - 4 - Example 2 The Hop field Model 
For the Hopheld model from (9) we have 


R ^(x) 


1 X 
(3a (3 — X 


(47) 


Thus the memory coefficients are given as = l/{a(3'^^^) for n > 1. In the sequel we 
show that the single step memory algorithm for the Hopheld model coincides with AMP 
algorithm which was introduced in the context of the CDMA problem in [2] and compressed 
sensing in [9]. From (36) we hrst write 


r=0 

^ ^ - !)• 
For convenience let us introduce 

A R(1-9(^)) ^ /3 

/3a(l - g(t)) 1 +/3a(l - g(t)) 


Notice that from (37) we may write (49) in the form of 




1 


A{t)[h + Jm{t)] + a(l — q{t))A{t)['if{t — 1) — A{t — l)m{t — 1)]. 


(48) 

(49) 

(50) 

(51) 


Then, dehning z(t) = 'ip{t) — A(t)m{t), we write the single step memory algorithm as 


m{t + 1) = tanh(z(t) + A(t)m{t)) (52) 

z{t) = ^A{t)[h + (J - (31)m{t)] + a{l - q{t))A{t)z{t - 1) (53) 

where I is the identity matrix of appropriate dimension. Note, that (I—J / (3) asymptotically 
coincides with the corresponding central Wishart matrix (see Section 2). Thereby we 
exactly obtain the AMP iteration steps as introduced in [2]. We also refer the reader to 
the related works [3] and [21], where the dynamics of the AMP algorithm is analyzed by 
means of DFT and Bolthausen’s conditioning technique [8], respectively. 
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Bolthausen’s conditioning technique for SK model [8] and Hopfield model [3] are based 
on the assumption that the entries of the underlying coupling matrix are dehned via zero- 
mean iid and Gaussian distributed random variables, see Section 2.1. Recently, it has 
been shown in [11] that the same analyses can be obtained without the need of Gaussian 
distribution assumption but a sub-Gaussian tail condition of the distribution is required. 
Indeed, thanks to the central limit theorem, one can show that the generating function (3) 
also yields the same result regardless of whether the Gaussian distribution assumption is 
considered or not, see [23, Section 5]. 


4-5. Example 3 The Random Orthogonal Model 


For the random orthogonal model from (10) we have R ^(x) 
the memory coefficients as 




n is odd 
0 n is even. 


x/(/32-x2). This yields 


(54) 


In Figure 1 and Figure 2 we illustrate the convergence of the single-step memory 
algorithm for the random orthogonal model obtained by running simulations. Notice that 
after few iteration steps the convergence becomes exponentially fast. The flat lines around 
(—300)dB, i.e. 10“^°, are the consequence of the machine precision of the computer which 
was used. 

The convergence improves with increasing temperature parameter 1//3. On the other 
hand, for large enough /3, the algorithm fails to converge. To estimate the critical parameter, 
we study the inverse decay time measured by the angle 9 as illustrated in Figure 1 and 
extrapolate the simulational data to 6^ = 0 using a convenient range of (3. One might expect 
that the critical jS would coincide with the one obtained from a de Almeida-Thouless (AT) 
stability condition which can also be derived from the TAP approach [4]. The AT line is 
given by the equation 


aR'(l — q) 


1 with 



N 


((1 - tatih“(^j))^) 


(55) 


where the random variable ifi is a Gaussian with mean h^ and variance gR'(l—g). Note, that 
[4] contains a typo in the corresponding expression. In Figure 3, we present a comparison 
between the simulations and (55). This coincidence can be understood from a dynamical 
point of view by analyzing the stability of the dynamics close to the hxed point. The 
details will be postponed to Section V. Finally, it also worth noting that the trajectories 
of the algorithm show a self-averaging behaviour above the AT line for large N. On the 
other hand, we hnd that below the AT line, there are strong sample to sample fluctuations. 
However, by averaging order parameters over many samples, we get a good agreement with 
the theory. Since the main goal of this paper is to present a convergent algorithm we will 
leave a more careful investigation of this point to future publications. 
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Figure 1: Random orthogonal model with fl G {1,3, ...,9,11}, hi = 1 and N = 2^^. Here 
e.g. 9 G {^ 3 , 6 * 5 } substitutes the respective linear decay time (in the log-domain), i.e. the 
convergence is faster bigger 9 is. 



Figure 2: Random orthogonal model with f3 G {1,...,5,7}, hi = 2 and N = 2^“^. The 
inverse temperature (3 = 6.9 gives the AT line. 
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Figure 3: Consistency of the diverging decay time with the AT line: The simulations were 
based on the average over ten realizations of J with N = 2^^. 


f.6. The Field Covariance Matrix 


In order to compare simulations of systems with analytical results obtained from the 
dynamical functional approach in the limit N ^ oo and to study the stability of TAP 
hxed-points we have to perform expectations over the Gaussian random variables 'ipi{t) 
(see (27)). Specihcally we write 

^(||m(t) - m{t-l)f)j = q{t) + q{t-l) -2C{t,t- 1) (56) 

where the order parameter C is given by 


1 J ^ f 

C(t + 1, F + 1) = — J dP(a;, y) tanh(('0i(t)) + x) tanh((V'i(t')) + v)- (57) 


Here P denotes a two-dimensional Gaussian distribution with zero mean. The mean of the 
held 'ipiit) follows from (27) as 


(Mt)) 



(l-g(r))Q(r 


1 ) 


hi. 


(58) 


Hence, we need to compute the corresponding covariance matrix which is dehned as 


- {AV)))) ■ 


(59) 
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In Appendix D we derive the expression 


C^{tR') = Q{t)Q{t') Y, 


Co + 1-Tn [A{x,y)]C{l,m) 

(1 - g(/))(l - g(m))Q(/ - l)Q(m - 1) ‘ 


(60) 


In this expression, for a power series f{x,y) = '^n k>o^nbkX'^y^ , we have introdnced the 
symbol Co^v.yk[f{x,y)] = Onbk for its coefficients. Moreover the function A is dehned as 

The function A has a relatively simple form for the three random matrix ensembles 
considered in this paper. For the SK and Hopheld models we have A{x,y) = xy/jd"^ 
and A{x,y) = xy/{l3‘^a), respectively. Moreover, for the random orthogonal model, from 
R“^(a;) = xKfl^ — x"^) we have Co^nyk[A{x,y)] = We next compare our 

simulations with theoretical results. We used the initializations m{t) = 0 for t G {0,1}, 
hence we assign C(1,0) = 0. In Figure 4 and 5 we show such a comparison above the 
AT line. Note, that no averaging over coupling matrices was used for the simulations. 
The integration over two-dimensional correlated Gaussian distribution used to calculate 
C{t,t — 1) was performed numerically. However the accuracy of the numerical method limits 
us for providing very precise results as t grows. In Figure 4 we illustrate the theoretical 
prediction of — 1) by the order parameter C(t,t — 1) for a large range of t. 

Below the AT line, the single-step memory algorithm diverges and simulated trajectories 
show strong sample fluctuations. However, by taking an average over a large number of 
trajectories we obtain a good agreement with the theory (see Figure 6). 


4-7. Asymptotic consistency with Cavity Variance 

In Section 4.2 we have demonstrated the convergence of the single step memory algorithm 
to the TAP equations. In a similar way one can show that (60) converges to the variance 
of the static held variance in (13) as t and t' tend to inhnity. Specihcally, by invoking 
the weak long-term response assumption (40) in (20) and following similar steps as in 
(Appendix D), for sufficiently large t, r < t, t' and r' < t' such that r/t and r'/t' being 
hnite as t and t' tend inhnity, one can show that 

^ T<t,T'<t' 

- [A((r, 2/)]R(1 - g)''R(l - qf (63) 

= ^ - 9),R(1 - «))• 


(64) 
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Figure 4: Above AT line: Comparison of the theory and simulation for /I = 20 and hi = 1 
N = 2^^ 



Figure 5: Above AT line: Comparison of the theory and simulation for /3 = 20 and hi = 1 
N = 
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Figure 6: Below AT line: Comparison of the theory and simulation for /S = 10 and hi = 2, 
N = 2^^. {■) j is obtained by 5 x 10^ realizations of J. 

In this expression, by abuse of notation, we denote limj^^a, A(a:, ?/) by A{x,x). Its explicit 
form is given by 

lim A(a:, y) = (R~^(x))^ = (R-^(x))2r'(R-^(x)). (65) 

(R AX))' 

Hence we have 

lim C*(«,(') = -^(R-'(R(1 - 9)))"R'(R-'(R(1 - «))) (66) 

(1 — q)^ 

= qK{l-q). (67) 

5. The stability of TAP fixed points 

In order to analyze the stability of the hxed points of the single step algorithm we resort 
to a linear stability analysis. We add Gaussian white noise to the dynamics, i.e. we set 
Aiit) Aiif) + ^iif) with {ei{t)A = e and discuss the limit e ^ 0. If the static TAP 
hxed point is stable, then the system should asymptotically show only small stationary 
huctuations around this and we can work in the Fourier domain. Hence, we assume 

C{t,t') = — [ dw C(a;)e*‘"(*-*') (68) 

27r J 


(69) 















Solving TAP Equations for Ising Models with General Invariant Random Matrices 17 


Inserting these Fonrier representations into (60), for large t and t' we may write (see 

(62)-(64)) 






(1 - g)2R(l - g)'-*-iR(l - q)m-t'-i 
C{u). 


A(e“*^R(l — q), e*‘^R(l — q)) 


(1 - 9 )= 


(70) 

( 71 ) 


For small noise e —)■ 0, the assumption of stability translates into small fluctuations around 
the static solution and we can write 


C{u) ~ 27rqd{u) + ec(a;) (72) 

C^p{u) ~ 27 rgR'(l — q)d{u) + ec^{u). (73) 


where we have separated fluctuations into static and dynamical parts. We will analyse the 
dynamical part next, but note, that also the static part q will have contributions from e. 
Thus for cu 7 ^ 0 we have 

A(e-*‘^R(1 - g),e*‘^R(l - g)) 

C^[UJ) = c{UJ) ---- 



where now the value of q is computed for e = 0. We next express c(a;) in terms of c^{u) 
for small e. The calculation in Appendix E is based on expanding 


C{t + 1, t T 1) 



(tanh(M(t) + hi) tanh(M(F) + hi))^ 


(75) 


up to hrst order in e. The brackets denote expectations over the two dimensional Gaussian 
held {u{t),u{t')) with {u{t)u{t')) ~ So+e{s{t—d)+6tq') fore —)■ 0, where Sq = gR'(l—g) and 
s{t — t') = c,p{t,t'). For t = t' the integral is over a single Gaussian only. The calculation 
shows that 

c(a;) = a(l + c^(a;)) (76) 

with a is dehned as in (55). Gombining this relationship with (74) we have 


c{u}) ~ a ( 1 — 


aA(e *^R(1 — g), e*‘^R(l — g)) 

(T^g? 


-1 


(77) 


In fact, for the SK, Hopheld and random orthogonal models, we have Co^nyk[A{x,y)] = 0 
Vn 7 ^ k. Therefore (77) is actually independent of u and from (65) we explicitly have that 


c(a;) 


a 

1 — q;R'( 1 — g) 


(78) 


In general, the right hand side of (77) must be non-negative to have a valid 
representation as a Fourier-transform of a time dependent correlation function. While 
the term A(e“*‘^R(l — g),e*‘^R(l — g)) is always positive, see (71), the second term is 
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small and positive for sufficiently small ft. But it changes sign and diverges. One expects 
that the divergence will occur hrst for the long range fluctuations, i.e. for the limit of low 
frequencies. Taking the limit yields 


lim cloj) 


a 

1 — aR'(l — q) 


(79) 


The condition 

aR'(l -q) = l (80) 

for the onset of instability agrees with the well-known AT stability criterion [4]. 


6. Discussion and Outlook 

In this paper we have presented a theoretical approach to the design of iterative algorithms 
for solving the TAP equations for Ising models with random couplings drawn from general 
invariant ensembles. We were guided by the idea that one needs to subtract terms from the 
internal held which depend on the values of the magnetizations at previous times. Using 
dynamical functional theory we have shown that in such a way, memory terms can be 
canceled and one arrives at a Gaussian distributed held, which eventually converges to the 
cavity held provided that a stability condition is fulhlled. We have presented a specihc 
method which we have called the ’Single Step Memory Construction’. Our approach may 
be extended in several ways. For example other subtraction methods are possible. One 
might design an alternative scheme, where the response function is required to be zero after 
one time step leading to a somewhat diherent algorithm and we will give details elsewhere. 
It would be interesting to see in which cases the explicit memory terms in the subtraction 
method can be simplihed by introducing auxiliary variables as is possible for the Hopheld 
model. Other extensions of our method would be to more general probabilistic models 
beyond the simple Ising case. This would include continuous random variables and other 
forms of interactions. An application to models of compressed sensing would be interesting 
where certain random matrix ensembles (such as the random orthogonal ones) might be 
natural models for the observation matrix. Specihcally one can trivially extend the random 
orthogonal ensemble by considering the more general spectrum such that the eigenvalues 
of J are distributed as a6{\ — 1) + (1 — q;)( 5(A + 1). In the context of compressed sensing 
this model coincides with the so-called random row-orthogonal ensemble [24], [25]. We will 
discuss details in a forthcoming publication. Finally, it would be important to address a 
drawback of our method which prevents an application to probabilistic inference problems 
with arbitrary data. Our subtraction scheme depends explicitly on the random matrix 
ensemble of couplings which may not be known in practice. Hence it would be interesting 
to develop schemes which adapt to the concrete data which would then achieve convergence 
to ‘adaptive TAP equations’ of [4], providing possible alternatives to the currently applied 
message passing algorithms [6]. 
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Appendix A. The proof of Ja — Atr(J) —)■ 0 


From (2) we have 


N N 

n=l n=l 


(A.l) 


where 0„ the nth column vector of Haar (orthogonal) O and Note that A is 

independent of O. For convenience we treat the realizations of A from its probability 
space and denote as A(ci;). We show the convergence for every realization Juiuj) = 
J2n=i The mean and variance of Jn{u)) are respectively given by 

N 

A«M (Ol) (A.2) 

n=l 

N 

Var[Jii(a;)] = ^ A^(a;)Var[O^J + ^n(a;)Afc(a;)Cov[Of„, 0^^] (A.3) 

n=l n<k 


where for the random variables X and Y, Var[A] and Cov[A, Y] denoting the variance X 
and the covariance of X and Y, respectively. For the proof, we basically need to show that 


lim Var[Jjj(a;)] = 0. (A.4) 

N^oo 


To do this we make use of the so-called (orthogonal) Weingarten calculus that allows to 
for calculate joint moments of Haar entries (analogeous to Wick calculus for Gaussian 
matrices). For details we refer the reader to [26, 27]. From [27, Theorem 2.1, Example 2.1] 
we have 

(OlOlk) = III . (A.5) 

[ N{N+2) “ 

Furthermore, we have (07) = 1/A^, Vi, j. Thus, the variance (A.3) reads 


Var[Jii(a;)] 


2(A- 1) 
N^{N + 2) 


N 

n=l 


4 

N^{N + 2){N -1) 


^ ^ (cj) A/i; (cj). 

n<k 


(A.6) 


Note that the spectrum J is assumed to converge almost surely to a compactly supported 
probability distribution such that the smallest and largest eigenvalue of J converge (almost 
surely) to the inhmum and supremum of the compact support, respectively. This implies 
that the minimum and maximum of the eigenvalues of J are uniformly bounded above for 
a sufficiently large N. This is a sufficient to get (A.4) from (A.6). Thereby we complete 
the proof. 
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Appendix B. The self-averaging limit of adaptive TAP Equations 


We will provide a derivation of the TAP equations (11)-(12) from the ‘adaptive TAP’ 
approach of [4], Under the assumption of Gaussian distributed cavity helds and an 
approximate linear response argument one hnds 


m,; = 


tanh Jij^j ~ Viiri^ 


Vi = An - 

An 

An = Vi + 


1 — mj 


(B.l) 

(B.2) 

(B.3) 


with the positive dehnite matrix x — ~ ■ 

To obtain the TAP equations in (11)-(12) we basically need to show that U ~ R(1 — g). 
To that end we write (B.2) in the form of 




/ (91ndet(A — J) 

V dAn 



(B.4) 


Our basic idea is to simplify In det(A — J) using results of free probability theory for random 
matrices. To that end we invoke an additional assumption that the empirical distribution 
function of {An, • • • jAatat} G converges weakly and almost surely to a compactly 
supported probability distribution as A —)■ cx). Since J is asymptotically invariant and 
having a compactly supported limiting spectrum, A and J are asymptotically (almost 
surely) free [26]. Thus, due to the uniform convergence property of the R-transform, see 
[17, Lemma 3.3.4], for a sufficiently large N we have 


= Rj;(x)-Rj'(-x) 


(B.5) 

(B.6) 


where we denote the R-transform of the spectrum of an A x A symmetric matrix X by 
R;^. Note that we have 1 — q = ;^tr(A — J)“^. Then, from Lemma 1 below we have 


— lndet(A - J) = -(1 + ln(l - q)) + 
~ -(1 + ln(l - q)) + 


»l-q 


den R^^( 


-UJ] 


(B.7) 


.l-g 


den R]^(—cn) — 


-l-g 


dcnR;;(a;) (B.8) 


= — In det(A - UI) + (1 - q)V - 


»l-g 


den Rj (en) 


(B.9) 


where V is defined through the implicit equation (1 — g) = T A'^-v implies 

that tr(A — J)~^ — tr(A — UI)"^. Here, from (B.8) to (B.9) we make use of the identity 
(B.20) below. Note that for a non-negative A x A matrix X we can write the Stieltjes 
transform of the eigenvalue distribution of X, say P^, as M^(en) = f dP^(x)/(en — x) 
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with uj G (— 00 , 0 ). Then, from (B.5) we use the subordination property [28, Chapter 22] 
as ~ M^{uj + Rj and take the limit cu —)■ 0. Doing so yields 

V ~ R(1 — q) where we note that R(a:) = hm 7 v^.oo Notice also that 


lndet(A-l/I) 

dAii 


A,, - V 


N{l-q) 


dV 

dAii 


(B.IO) 


Hence, we hnally obtain 


91ndet(A —J) 1 
dAii An - V 


(B.ll) 


which yields V) ~ 1/ = R(1 — g). 

Lemma 1 Let an N x N matrix X he positive definite. Let Q = ^tr(JC“^). Then, 

1 

— Indet(X) =-(1+ lng) + / dw R^(-a;). (B.12) 

A Jo 

Proof 1 Note that 

Indet(X) = lim In det(e“^I + X). (B.13) 

For convenience let r]{e) = ;^tr((I + eX)“^) for e > 0. Since X is positive definite we can 
write [29] 

(B.14) 

er/(e) 

Applying the substitution oj = tri(t) to the following integral we have 



do; R;^(—a;) 



r]{t)+tr]'{t) 


tr]{t) 


[1 - m 


v'jt) 

7]{t) 


{l-r]{t))+ / dt 


1 - yjt) 

t 


\nri{e) + 1 — 77 (e) + f dt 

Jo t 

In 77 (e) + 1 — ri{e) + lndet(I + eX) 
Ing(e) + 1 - 77 (e) + ^ lndet(e“^I + X) 


with g(e) = e 77 (e). In other words we have 


N 


In det(e 


rQid 

-'I + X) = ( 77 (e)- 1 -InQ(e)) + 

Jo 


da;R;^(—cu). 


(B.15) 

(B.16) 

(B.17) 

(B.18) 

(B.19) 

(B.20) 


Taking the limit e ^ 00 we complete the proof. 
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Appendix C. Derivation of DFT Results 

For the sake of notational compactness let 

g({m(r),7(r)}*^o) - - / ({7(r), m(r)}*-^o)). (C.l) 

By the Fourier representation of the Dirac function we write 


mm 



dm(t)d7(t)d7(t) 


(C.2) 


The derivation is separated into two parts: i) disorder average; ii) the saddle point method. 


Appendix C.l. disorder average 

For convenience let us introduce N x T matrices X and X with Xnt 
Xnt = We need to evaluate 


mn(t+l) 

Vn 


and 





~ e 2 




(C.3) 


with Q = XX^ + Xx\ Here (C.3) follows directly from (3)-(7). We will evaluate 


tr(Q^) = tr + XxV) . 

(C.4) 

in terms of the matrices (C.4) 


g = x^x 

(C.5) 


c = x^x 

(C.6) 


c = x^x 

(C.7) 

Then by using cyclic invariance 

of the trace we obtain the expression 


tr(Q”) = 2tr(^;” 

n—2 

) + + \{g,c,c ), 

k=f) 

(C.8) 

where the function I satishes 





d\{g,c,c) 

dC 

II 

o 

O 

(C.9) 


This means that I contains more than one factor C and will thus-at the saddle-point 
value-C = 0 not contribute to saddle-point equations. 
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Appendix C.2. The saddle point calculation 
We write as 






, T-l 

n 

t=0 


5 (iNQit, s) — m(t)^ 7 (s)) S [NC{t, s) — m(t)^m(s)) 6 (^NC{t, s) + 7(^)H(s) j • 

t,S 

(C.IO) 


By the Fourier representation of Dirac function we write the last line of (C.IO) as 

f. j PiQPiCdC e^-irs) iG{t,s)(^iNg{t,s)-m{t)ms))+iC{t,s)[NC{t,s)-m{t)^m.{s))-C{t,s)(^NC{t,s)+■y{t)^{s)) 

(C.ll) 

Here c is a constant irrelevant for the saddle point calculation. We define the auxiliary 
single-site partition function 

Z^iln,g,C,C) ^ / n {dm„(t)d7.(t)d7n(t) g({m„(r),} 

t=o 

In this way we can write (C.IO) as 

{Z({((()})>J-/d6d6dCdCdCdCefE"i. s;."{e‘c(5t)»->-‘<;}+i(5,c,<;)) 

Q^'E,(t,s)[-G{t,s)Gt,s+iC{t,s)C(t,s)-C{t,s)C{t,s)]+J2ri log Zn{ln,9,Cfi) 


In the large N limit we can perform the integrations over Q, Q, C, C, C, C with the saddle 
point methods. Doing so yields: 


S) = -^Y 1 (^n{t)%{s)) 2^ 

n 

(C.14) 

s) = (^n{t)mn{s)) 2^ 

n 

(C.15) 

C(C S) = -^ {ln{t)ln{s)) 2 ^ ■ 

(C.16) 


n 


with ( ) 2 ^ denoting the average with respect to the single-site partition function. Here the 
quantity (C.16) has only the trivial solution that Ct^s = 0. Other solutions may violate the 
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normalization Z{{l{t) = 0}) = 1. Furthermore this solution leads to C = 0. By invoking 
(C.8) we have 


e? = RiG) 

^ oo n—2 




\n—2—k 

y 

n=l k=0 

Thus, for large N we get the factorization of the generating function 

N ^ T-1 


(C.17) 

(C.18) 


(Z({/(t)}))j ~n / n g(Wn(r),7n(r)}^<Je 




n=l t=0 


P*7nh)(7n(d-^n-Es5h:S)"ln(^f))„-Es^h.«)7«W7n(s) 


(C.19) 


We linearize the quadratic terms in 7n(t) by introducing auxiliary Gaussian random fields 
(pn{t) which are iid for each n with zero mean and covariance C^{t,s) = 2C{t,s) = 
(0„(t)0„(s)). In this way we can write 






(C.20) 


Doing so leads (C.19) to 


N 


T-1 


{Z{{l{t)}))j ^ n / d^({0„(t)};O,C0) Yl {dmn{t)d-fn{t) d{mn{t) - /i 7„(r)}^Jo) 


72=1 ' 


t=0 


- hn -'^Git, 
V S<t 


s)mn{s) - 4>n{t) I 

(C.21) 


Finally, notice that —i{'mn{t)^n{s)) Thus, G equals the response-function 
(21). This completes the derivation. 


Appendix D. Derivation of equation (60) 
First note that from (27) and (30) we have 

4(t-l,F-l) = 


{GCpG^){tR') 


(l-g(f))(l-g(f)) 


(D.l) 


where Cp is defined as in (20). For sake of compactness we let /(x) = R“^(x). By 
elementary combinatorics and using ^ = /(^) we can show that any power of the matrix 
G can be written as 


G^ = Y.CoAf{^f)G'^- 


(D.2) 


72=1 
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where for any power series f{x) = we represent the coefficient via the definition 

Ok = Co^k{f{x)). This means that we have 


t-i 


g'^ihr) = Co,.-.{f{x)'^)llg{s + Gs). 


(D.3) 


Hence we also get 

(a*+‘c(6')”"‘“*)(*-i.('-i)= E (ilds + i.s)] rna(s + i.. 

l<t,7n<t' \s=l / \s=m 


(D.4) 


where we have extended the dehnitions of coefficients Coxr<.yk[f{x,y)] = Onbk to double 
power series f{x,y) = J2nk>o^nbkx"‘y^. Summing up the geometric series, we have 


n—2 


= Co,.-,,, 


k=0 


fivT ^ \ 


and 

OO 

Y1 


n=l 


I = ^O^t-U,t'-n 




y-x 


l/fe) -/Wl/l/W/(!/)l. 

(D.6) 


’ V [/(!/)-/Wl/I/W/Ml 

Putting everything together completes the proof. 

Appendix E. Derivation of equation (76) 

For the sake of compactness, without loss of generality, we may set hi = h. Using the 
representation of the Gaussian density in terms of the characteristic function we have the 
expansion for t ^ t' 

(tanh(M(t) + h) tanh(M(F) + h))^ ~ 

f duidu 2 dkidk 2 exp i{kiUi + ^2^2) - ^{so + ^{s(t, t) + l)}(/c? + kj) - {sq + ^s(t, t')}kik 2 


X tanh(Mi + hi) tanh(M2 + hi) 
~ g — - / dMidu 2 dA;idA ;2 exp 


i{kiUi + k 2 U 2 ) - ^(^1 + ^2 + 2/C1/C2) 


X 


X tanh(Mi + h) tanh(M 2 + h) {{s{t, t) + l)(/ci + kl) + 2s{t, t')kik 2 } 
= , + .(.((,0 + 1) (ta„h(« + 


(E.l) 
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The last line is obtained by representing kik 2 etc by derivatives with respect to Ui and U 2 - 
Repeating a similar equation for t = t', we get 


(tanh^(M(t) + 

= q + e{s{t,t) + 1) 


e, . , _ /tanh^fw + h) 

=,+2wu)+i)(— 

ta„h(« + + M ^ ^ 3tanM« + h) 



(E.2) 


Both expansions can be represented in the single equation 
d (tanh(M(t) + h) tanh(M(t') + h)) 


de 


= {s(t, t) + 1) ^tanh(M + h) 


8“^ tanh(M + h) 


du^ 


+ {s{t, t') + ( ( 


f 9tanh(M + h) 


du 


(E,3) 


Note, that only the second term contributes to the dynamic part of the fluctuations. Hence, 
by taking the Fourier transform and noting that ^ _ tanh^(M + h) the result 

is obtained. 
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